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Sensory stimuli are usually composed of different features (the what) appearing at irregular times (the 
when). Neural responses often use spike patterns to represent sensory information. The what is hypothe- 
| sised to be encoded in the identity of the elicited patterns (the pattern categories), and the when, in the time 
^ positions of patterns (the pattern timing). However, this standard view is oversimplified. In the real world, 
i-O the what and the when might not be separable concepts, for instance, if they are correlated in the stimulus. 
(N) ■ In addition, neuronal dynamics can condition the pattern timing to be correlated with the pattern categories. 

Hence, timing and categories of patterns may not constitute independent channels of information. In this 
O paper, we assess the role of spike patterns in the neural code, irrespective of the nature of the patterns. We 
first define information-theoretical quantities that allow us to quantify the information encoded by different 
aspects of the neural response. We also introduce the notion of synergy/redundancy between time positions 



> 



H ' and categories of patterns. We subsequently establish the relation between the what and the when in the 
stimulus with the timing and the categories of patterns. To that aim, we quantify the mutual information 
between different aspects of the stimulus and different aspects of the response. This formal framework 
allows us to determine the precise conditions under which the standard view holds, as well as the departures 
from this simple case. Finally, we study the capability of different response aspects to represent the what 
and the when in the neural response. 
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1. Introduction: Patterns in the neural response 



Sensory neurons represent external stimuli. In realistic conditions, different stimulus features (for example, 
the presence of a predator or a prey) appear at irregular times. Therefore, an efficient sensory system should 
not only represent the identity of each perceived stimulus, but also, its timing. Colloquially, qualitative 
differences between stimulus features have been called the what in the stimulus, whereas the temporal 
locations of the features constitute the when. Spike trains can encode both t he what and the when, for ex 



ample, as a sequence of spike patterns. This idea constit utes a standard view (ITheunissen and Milleii 



1995 



Borst and Theunissen , 



1999 



Krahe and Gabbianill2004|) . where the timing of patter ns indicates when stim 



ulus features occur, while the pattern identities tag what stimulus featur es happened (|Martinez-Conde et al. 



2002; 



Alitto et al 



2005 



Oswald et al 



2007 



Eyherabide et al. . 



2008b . The information provided by the 



distinction between different spike patterns is here called category information. In the same manner, the 
information transmitted by the timing of spike patterns is here called time information. According to the 
standard view, the category and the time information represent the knowledge of the what and the when in 
the stimulus, respectively. In this work, we address the conditions under which these assumptions hold, as 
well as departures from the standard view. 



Many studies have shown the ubiquitous presence of patterns in the neural response. The patterns can be, 
for instance, high-frequency burst- like discharge s of varying length and la tency. Examples have been f ound 
in primary auditory cortex ( Nelken et al. . 2005), the salamander retina (Gollisch and Meistei. I2OO8), the 



mamm alian early visual system (IDeBusk et al. 



2008J), and grasshopper auditory receptors (|Eyherabide et al. . 



1997; 



Martinez-Conde et al 
3091: 



2002; 



Gaudrv and Reinagel 



Sabourin and Pol lack. 



cases, the patterns are spike doublets of different inter -spike interva l (ISI) duration. 



200 



9). 



Reich et al. 



n other 



Oswald et al 



(2000) 



(2007) found a similar code in the 



presented an example of this type in primate VI; and 
electrosensory lobe of the weakly electric fish. In yet other cas es, patterns are more abstract spatiotemporal 
combinations of spikes and silences defined in single neurons (IFellous et all 120041) and neural populations 



(Na dasdy . 



200' 



0; 



Giitig and Sompolinskyl 120061) 



If different spike patterns represent different stimulus features, which aspects of the pattern are relevant 
to the distinction between the different features? To answer this question, previous studies have classified the 
response patterns into different types of categories, depending on different response aspects. The relevance 
of each candidate aspe ct was addressed using what w e here d efine as the category information. For example, 
in the auditory cortex, Furukawa and Middlebrooksl (|2002[) assessed how informative patterns were when 
categorised in three different ways, using the first spike l atency, the tot a l num ber of spikes, or the variability 



Gawne et al. 



1996) have not only compared the 



in the spike timing. In an even more ambitious study, 
information separately transmitted by response latency and spike count, but also related these two response 
properties to two different stimulus features: contrast and orientation, respectively. However, these works 
have not addressed how the stimulus timing is represented by the response patterns. 



The role of patterns in signaling the occurrence of the stimulus features can only be addressed in those 
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experiments where the stimulus features appear at irregular times. In this context, previous app roaches have 
estimated the time information iGaudry and Reinagel . 



2008 



Evherabide and Samengol 



2010), or have ei- 



ther e mployed other statistical measures such as reverse correlation (|Martinez-Conde et al. . 



200dlEyherabideetal. . 



2008). The time information was calculated as the one encoded by the pattern onsets alone, without 



distinguishing between different types of patterns. 



In this paper, we analyse the role of timing and categories of patterns in the neural code. To this aim, 
we build different representations of the neural response preserving one of these two aspects at a time. This 
allows us to quantify the time and the category information separately. We determine the precise meaning 
of these quantiti es and study of their variations for different represen t ations of the neural response. Unlike 

"IF 



2008 



Evherabide et all 120091 : 



Foffani et al 



2009), we quantify the 



previous works (|Gaudry and Reinagel . 
information preserved and lost when the neural response is read out in such a way that only the categories 
(timing) of patterns are preserved. As a result, the relevance of each aspect of the neural response is 
unambiguously determined. 



In principle, the timing and the categories of spike patterns ma y be correlated. These interactions 



may be due to properties of the encoding neuron (such as latency codes iFurukawa and Middlebrooksl 12002 



Gollisch and Meister, 
Reinagel et all 



1997: 



2005; 



2008), properties of the decoding neuron (when reading a pattern-based code 



Lisman, 



1999)j_fhe convention used to assigned a time reference to the patterns (|Nelken et al 



Evherabide et al. 



(|Fellous et all 



200. 



4: 



200 8j), or the convention used to identify th e patterns from the neural response 



Alitto et al. . 



2005 



Gaudry and Reinagel l2008|) . A statistical dependence between 



timing and categories of patterns may, for example, introduce redundancy between the time and category 
information. Thus, the same information may be contained in different aspects of the response (categorical 
or temporal aspects). In addition, the statistical dependence might also induce synergy, in which case 
extracting all the information about the what and the when requires the simultaneous read-out of both 
aspects. The presence of synergy and redundancy between the time and category information may affect 
the way each of them represents the what and the when in the stimulus. 



In the present study, we provide a formal framework to gain insight of the interaction between the 
timing and the categories of patterns for different neural codes. We formally define the what and the when 
as representations of the stimulus preserving only the identities and timing of stimulus features, respectively. 
We then establish the conditions under which the pattern categories encode the what in the stimulus, and the 
timings the when. We also study departures from this standard interpretation, in particular, when the time 
position of patterns depends on their internal structure. We show the impact of this dependence on both the 
link with the what and the when and the relative relevance of the timing and categories of patterns. Our 
study is therefore intended to motivate more systematic explorations of the neural code in sensory systems. 
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2. Methods 



2.1. Reduced representations of the neural response 



A representation is a description of the neural response. Formally, it is obtained by transforming the 
recorded neural activity through a deterministic mapping. Throughout this paper, the expressions "de- 
terministic mapping" and "function" are used as synonyms. We only consider functions that transform the 
unprocessed neural response U into sequences of events e, = (t h c,), characterised by their time positions 
(fi) and categories (c,). An event is a definite response stretch. Based on their internal structure, events are 
classified into different categories, as explained later in this section. Individual spikes may be regarded as 
the simplest events. In this case, the sequence of events is called the spike representation (see Figured]^), 
comprising events belonging to a single category: the category "spikes". 



From the spike representation, we can define more complex events, hereafter called patterns (see bold 
symb ols in the spike representation in Figure [TJA)- Patterns may be defined in terms of spikes, bursts or 



ISIs (lAlitto et al 



2005 



Luna et al 



2005 



Oswald et al 



2007 



Eyherabide et all |2008|) . They may in- 



volve one or several neurons. Example s of popu l ation patterns are coincident firing, precise firing events 
and sequences, or distributed patterns (IHopfieldl . 1 19951 : lAbeles and Gaa 1200 ll : iReinagel and Reidl . 120021 : 



Gutig and SompolinskyL 



20061) . The sequence of patterns obtained by transforming the spike representation 



is called the pattern representation. Analogously, the sequence of patterns only characterised by either 
their time positions or their categories constitute the time representation and category representation, re- 
spectively. Details on how to build these sequences are explained below. For simplicity, these sequences are 
represented in Figure [Das sequences of symbols n, indicating specific events (n > 0) and silences (n = 0). 



Formally, to obtain the spike representation (R), the unprocessed neural response (U) is transformed into 
a sequence of spikes (1) and silences (0) (Figured^). The time bin is taken small enough to include at most 
one spike. Differences in shape of action potentials are ignored, while their time positions are preserved, 
with temporal precision limited by the bin size. As a result, several sequences of action potentials may be 
represented by the same spike sequence (see Figure Q}3). 

In the pattern representation (B), the spike sequence is transformed into a sequence of silences (n = 0) 
and spike patterns (n = b > 0), distinguished solely by their category b. For example, in Figure [H patterns 
are defined as response stretches containing consecutive spikes separated by at most one silence. The time 
positions of the pattern is defined as the first spike in each pattern stretch, whereas patterns with the same 
number of spikes are grouped into the same pattern category. Only information about pattern categories and 
time positions remains (compare the bold symbols in the spike and the pattern representation in Figure[TjA). 
By ignoring differences among patterns within categories, several spike sequences can be mapped into the 
same pattern sequence, as shown in FigureQ]C. 

The time position of patterns is measured with respect to a common origin, in general, the begin- 
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Figure 1 : Representations of the neural response. (A) In the spike representation, only the timing of 
action potentials is described, discarding the fine structure of the voltage traces. In the pattern representa- 
tion, only the timing and categories of spike patterns remain. This representation is further transformed, to 
obtain the time and the category representations. The time (category) representation only keeps information 
about the timing (categories) of the spike patterns. (B), (C), (D) and (E) Each successive transformation 
of the neural response through a deterministic function simultaneously reduces both the variability in the 
neural response and number of possible responses. 
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ning of the experiment. It ca n be defined, f o r example, a s the first (or any other) spike of the pattern 



or as the mean response time (ILismanl 1 19971 : 



Nelkenetal 



2005; 



Evherabide et al 



2009). Patterns are 



classified into categories according to different aspects des cribing their i nterna l structure, such as the 



latency, the number of spikes or the spik e-time dispersion (IGawne et all 1 19961 : 



1995 



Furukawa and Middlebrooks, 



Notice that latencies are usua 



Theunissen and Miller 



2007 



l y defined with respec t to th e 



Gollisch and Meisterl 



2008) 



stimulus onset, which is not a response property (|Chase and Youngl . 
Thus, latencies and timing of spike patterns are different concepts, and the latency cannot be read out 
from the neural response alone. H owever, latencies have also been defined with re spect to the local field 



potential (IMontemurro et all 120081) or population activity (IChase and Yo ung. 



2007). These definitions can 



be reg arded as internal aspects of spatiotemporal spike patterns (ITheunissen and Millei 
200oL 



1995 



Nadasdyl . 



Categ ories of patterns c an be built by discretizing the range of one or several internal aspects. For 



example, 



Reich et al. 



(|2000l) defined patterns as individual ISIs, and categorised them in terms of their 
duration. Three categories were considered, depending on whether the ISI was short, medium or large. In 
other cases, patterns may be sequences of spikes separated by less than a certain ti me interval. Categories of 



patterns can then be defined, depending on the number of spik es in each pattern (|Reinagel and Reid . 



Martinez-Conde et all 



length of the first ISI (IQswald et al 



2002; 



200 



0: 



Evherabide and Samengo , 201ol) . as shown in Figure [fl or depending on the 



2007|) . The theory developed in this paper is valid irrespective of the 



way in which one chooses to define the pattern time positions and the pattern categories. 

From the pattern sequence, we obtain the time representation (T) by only keeping the time positions 
of patterns. As a result, the neural response is transformed into a sequence of silences (0) and events 
(1), indicating the occurrence of a pattern in the corresponding time bin and disregarding its category. 
The temporal precision of the pattern representation is preserved in the time representation. However, by 
ignoring differences between categories, different pattern sequences can be mapped into the same time 
representation, as illustrated in Figure dp. 

The category representation (C) is complementary to the time representation. It is obtained from the 
pattern sequence, by only keeping information about the categories of patterns while ignoring their time 
positions. The neural response is transformed into a sequence of integer symbols n > 0, representing 
the sequence of pattern categories in the response. The exact time position of patterns is lost: only their 
order remains. Therefore, several pattern sequences may be mapped onto the same category sequence, as 
indicated in Figure [IB. 



The spike (R), pattern (B), time (T) and category (C) representations are derived through functions that 
depend only on the previous representation, as denoted by the arrows in Figure [l]\, and formally expressed 
by the following equations: 
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Neural response U (experiment) (la) 

Spike representation R = /?u^r(U) (lb) 

Pattern representation B = /z R „> B (R) (lc) 

Time representation T = /i b ^t(B) (Id) 

Category representation C = /z B ->c (B) ; (le) 



where /?x^y represents the function h that is applied to the representation X to obtain the representation Y. 
These transformations progressively reduce both the variability in the neural response and the number of 
possible responses 



H(U) > H(R) > H(B) > 



H(T) 
H(C) ' 



IUI > IRI > IBI > 



|T| 

|C| ' 



(2a) 
(2b) 



where H(X) means the entropy H of the set X (|Cover and ThomasLll991r) . and |X| indicates its cardinality, 
i.e. the number of elements of the set X. 



2.2. Calculation of mutual information rates 



The mutual information I(X; S) between two random variables X and S is defined as the reduction in the 
uncertainty of one of the random variables due to the knowledge of the other. It is formally expressed as a 
difference between two entropies 



I(X; S) = H(X) - H(X|S) 



(3) 



where H(X) is t he total entropy of X and H (X|S) represents the conditional or noise entropy of X provided 



that S is known (Cover and Thomas 



1991 



We estimate the mutual information between the stim ulus S and a repre sentation X of the neural 
response using the so-called Direct Method, introduced by IStrong et al.1 (| 1998b . The unprocessed neural 
response U is divided into time intervals U r of length r. Each response stretch U T is then transformed into 
the discrete-time representation X r (X r = /zu^x (U r )), also called words. As a result 



/(S;V T )>/(S;X T ). 
7 



(4) 



This inequality is valid for every time interval of length r (ICover and Thomasl 



1991) and is not limited 



to the asymptotic regim e for long time intervals, like in previous calculations (|Gaudry and 



Reinagej 



2008 



Eyherabide et all 120091) . The mutual information calculated with words of length r only quantifies properly 



the contribution of spike patterns that are shorter than r. In order to include the correlations between these 
patterns, even longer words are needed. Therefore, in this study, the maximum window length ranged 
between 3 and 4 times the maximum pattern duration. 



The total entropy (H(X T )) and noise entropy (H(X r |S)) are estimated using the distributions of words X T 
unconditional (P(X r )) and conditional (P(X T |S)) on the stimulus S, respectively. The mutual information 
I(S; X r ) is computed by subtracting H(X T |S) from H(X r ) (Eq.0. This calculation is repeated for increasing 
word lengths, and the mutual information rate /(S; X) between the stimulus S and a representation X of the 
neural response is estimated as 



/(S; X) = lim 



KS;X r ) 



(5) 



This quantity represents the mutual information per unit time when the stimulus and the response are read 
out with very long words. In this work we always calculate mutual information rates unless it is otherwise 
indicated. However, for compactness, we sometimes refer to this quantity simply as "information". 



The estimation of information suffers from both bias and variance (Panzeri et al. 



20071) . In this work, the 



sampling bias of the infor mation estimation was corrected using the NSB approach for the exp erimental data 



(Ne menman et al. 



2004). For the simulations, we used instead the quadratic extrapolation (IStrong et al. 



1998), due to its simplicity and the possibility of generating large amounts of data. The standard dev iation 
of the information was estimated from the linear extrapolation to infinitely long words (IRicd . 1 19951) . The 
bias correction was always lower than 1.5 % and the standard deviation, always lower than 1 %, for all 
simulations and all word lengths; thus error bars are not visible in the fi gures. When comparisons between 
information estimations were needed, one-sided t-tests were performed (|Ricd . ll995l) . 



2.3. Simulated data 



Simulations are used to exemplify the theoretical results and to gain additional insight on how different 
response conditions affect information transmission in well-known neural models and neural codes. They 
represent highly idealised cases, with unrealistically long runs and number of trials, that allow us to readily 
exemplify the theoretical results and transparently obtain reliable information estimates. Firstly, we define 
the parameters used in the simulations and relate them to the specific aspects of the stimulus and the 
response. Then, we report the specific values for the parameters. 
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2.4. General description 



In the simulations, the stimulus consists of a random sequence of instantaneous discrete events, here called 
stimulus features. Each stimulus feature is characterised by specific physical properties, as for example, the 
colour of a visual stimul us, the pitch of an auditory stimulus, the intensity of a tactile stimulus, or the odou r 



of an olfactory stimulus (|Poulos et al. 



19841 : iRolen and Caprio . 



2007 



Nelken 



2008 



Mancuso et al. 



2009). 



In the real world, however, features are not necessarily di screte. If they are continuous. , one can discretize 



them by dividing their d omain into discrete categories (|Martinez-Conde et all |2002| ; 



2008 



Marsat et al 



2009m . The present framework sets no upper limit to the number of features, nor to 



Evherabide et al. 



the similarity between different categories. In addition, features might not be instantaneo us but rather 
develop in extended time windows, as it happens wit h the chirps in the w eakly electric fish (|Benda et al 



2005), th e oscillations in the el ectric field potential (lOswald et all 120071) and the amplitude of auditory 



stimuli (|Eyherabide et all 120081) . In order to capture the duration of real stimuli, in the simulations we 



define a minimum inter-feature interval A s min , for each feature s. After the presentation of a feature s, no 
other feature may appear in an interval lower or equal to X s min . 

In the simulated data, each stimulus feature elicits a neural response (see Figure[2K)- Since in this paper 
we are interested in pattern-based codes, each feature generates a pattern of spikes belonging to some pattern 
category. The correspondence between stimulus features and pattern categories may be noisy. We consider 
both categorical noise (the pattern category varies from trial to trial) and temporal noise (the timing of the 
pattern varies from trial to trial). In Figure 03, we show examples of all noise conditions using burst-like 
response patterns. In those examples, categories were defined according to the number of spikes in each 
burst. 



Symbolically, the stimulus S is represented as a sequence of symbols s, one per time bin At. Each s is 
drawn randomly from the set of all possible outcomes E s = {0, 1, . . . , Ns}- The symbol s = indicates a 
silence (the absence of a feature), whereas s > tags the presence of a given feature. Each feature s elicits 
a response pattern r, drawn from the set E r of all possible patterns, with probability P r (r\s). The response 
pattern r may appear with latency /z r , which might depend on the evoked pattern r. A neural response R, 
elicited by a sequence of stimulus features, may be composed of several response patterns (see bold symbol 
sequences in[2jA). 

Figure [2j3 shows example neural codes with no noise (upper left panel), categorical noise alone (upper 
right), temporal noise alone (lower left), and a mixture of categorical and temporal noise (lower right). The 
categorical noise is defined by P b (b\s), quantifying the probability that a response category b be elicited in 



response to stimulus s (see Appendix A for the relation between Pb(b\s) and P^rls))- The temporal noise 
is implemented as jitter in the pattern onset time. That is, temporal jitter affects the pattern as a whole, 
displacing all spikes in the pattern by the same amount of time. The temporal displacement is drawn from 
a uniform distribution in the interval (-cr h , <jb), where the jitter cr fe may depend on the pattern b. 
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Figure 2: Simulations: Design and construction. (A) Example of a stimulus stretch and the elicited 
response. The stimulus is depicted as an integer sequence of silences (0) and features (s > 0), one symbol 
per time bin of size At. After a feature arrival, the stimulus remains silent for a period A s min . The response 
is represented as a binary sequence of spikes (1) and silences (0). Each stimulus feature elicits a response 
pattern: A burst containing n spikes. Different categories correspond to different intra-burst spike counts. 
(B) Examples of different response conditions. Upper panels: no temporal jitter; lower panels: the pattern, 
as a whole, is displaced due to temporal jitter; left panels: no categorical noise; right panels: each stimulus 
feature elicits pattern responses belonging to more than a single category. 



2.4.1. Details and parameters 



Simulated neural responses consisted of four different patterns, elicited by a stimulus with four different 
features. The response patterns were bursts of spikes, containing between 1 to 4 spikes. The intra-burst 
ISI was y m i n = 2 ms. However, since the neural response is transformed into the pattern representation, the 
results are valid irrespective of the nature of the patterns (see lsection 2.11) . The stimulus was presented 200 
times, each one lasting for 2000 s. The minimum inter- feature time interval is A,„j n = 12 ms. In all cases, no 
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interference between patterns was considered (see lsection 3.81) . We used a time bin of size At = 1 ms. 



Simulation 1: This simulation is used to illustrate the effect of using different representations of the 
neural response, and to compare an ideal situation where the correspondence between features and patterns 
is known, with a more realistic case, where the neural code is unknown. The temporal jitter was <x = 1 ms 
and the latency was = 1 ms. Stimulus features probability p(s) were set to: p{\) = 0.06, p(2) = 0.04, 
p(3) = 0.03, p(4) = 0.02. Categorical noise (p(b\s), b t s): p(i + l\i) = 0.1 (4 - i), < i < 4; otherwise 
p(b\s) = 0. 



Simulation 2: These simulations are used to address the role of the timing and category of patterns in 
the neural code, and to study the relation with the what and the when in the stimulus. The latency was 
jx = 1 ms. When present, temporal jitter was set to cr = 1 ms and categorical noise (p(b\ s), b ^ s) was given 
by: p(i + = p{i\i + 1) = p{3\\) = p(2\A) = 0.1, < i < 4; otherwise p{b\s) = 0. Stimulus features 
probability p(s) = 0.025, < s < 4. 



2.5. Electrophysiology 



Experimental neural data were provided by Ariel Rokem and Andreas V. M. Herz; they perform ed intracel- 
lular recordings in vivo, on the auditory nerve of Locusta Migratoria (see iRokem et all I2006L for details). 
Auditory stimuli consisted of a 3 kHz carrier sine wave, amplitude modulated by a low pass filtered signal 
with a Gaussian distribution. The AM signal had a mean amplitude of 53.9 dB, a 6dB standard deviation 
and a cut-off frequency of 25 Hz (see Figure [3K upper cell). Each stimulation lasted for 1000 ms with a 
pause of 700 ms between repeated presentations of the stimulus, in order to minimise the influence of slow 
adaptation. To eliminate fast adaptation effects, the first 200 ms of each trial were discarded. The recorded 
response (see Figure [3j\ lower panel) consisted of 479 trials, with a mean firing rate of 108 + 6 sp ' kes / s 
(mean + standard d eviation across trials). B urst activity was observed and associated with specific features 
in the stimulus (see LEyherabide et all I2008L for the analysis of burst activity in the whole data set). Bursts 
contained up to 14 spikes; Figured shows the firing probability distribution as a function of the intra-burst 
spike count. 



3. Results 



3.1. Information transmitted by different representations of the neural response: spike and pattern 
information 

In order to understand how stimuli are encoded in the neural response, the recorded neural activity U is 
transformed into several different representations. Each representation keeps some aspects of the original 
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Figure 3: Experimental data from a grasshopper auditory receptor neuron. (A) Upper panel: Sample 
of the amplitude modulation of the sound stimulus used in the recordings. Lower panel: Response to 30 of 
479 repeated stimulus presentations showing conspicuous burst activity. Each vertical line represents a sin- 
gle spike. (B) Probability of firing a burst with n intra-burst spikes, in a time bin of size At = 1 ms. Isolated 
spikes (n = 1) and burst activity (n > 1) represent 49.4% and 50.6% of the firing events, respectively. 

neural response while discarding others. The spike representation R is probably the most widely used (see 
Isection 2TTT) . We define the spike information I(S; R) as the mutual information rate between the stimulus S 
and the spike representation R of the neural response. 

The spike sequence can be further transformed into a sequence of patterns of spikes, called the pattern 
representation B. To that end, all possible patterns of spikes are classified into pre-defined categories, for 
example, burst codes, ISI codes, etc. (see I section 2. H and references therein). We define pattern information 
I(S; B) as the information about the stimulus S, carried by the sequence of patterns B. 

The pattern information cannot be greater than the spike information, which in turn cannot be greater 
than the information in the unprocessed neural response 



/(S;B)</(S;R)</(S;U) 



(6) 
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This result can be directly proved from the determin istic relation between U, R and B (Eqs. Q]) and the 
data processing inequality (|Cover and ThomasUl99l|) . Notwithstanding, several neuroscience papers have 
reported data contradicting Eq. 161 (see [section 4.31 ). Intuitively, out of all the information carried by the 
unprocessed neural response, the spike information only contains the information preserved in the spike 
timing. Analogously, out of the information carried in the spike representation, the pattern information only 
preserves the information carried by both the time positions and the categories of the chosen patterns. 



3.2. Choosing the pattern representation 



In this paper, we quantify the amount of time and category information encoded by pattern-based codes. 
This information depends critically on the choice of the pattern representation. In this subsection, we 
discuss how to evaluate whether a given choice is convenient or not. One can choose any set of pattern 
categories to define the alphabet of the pattern representation. Some choices, however, preserve more 
information about the stimulus than others. The comparison between the information carried by different 
patt ern representations gives insight on how relevant to information transmission the preserved structures 
are (IVicto a 120021: iNelken and Chechia |2007|) . i.e. formally, on whether they constitute sufficient statistics 



(Cover and Thomasl 



19911) . A suitable representation should reduce the variability in the neural response 



due to noise, while preserving the variability associated with variations in the encoded stimulus. Thus, any 
representation preserving less information than the spike information is neglecting informative variability. 
In addition, one may also be interested in a neural representation that can be easily or rapidly read out, 
or that is robust to environmental changes, etc. The chosen neural representation typically results from a 
trade-off between these requirements. 



Here we focus on analysing whether the chosen representation alters the correspondence between the 
stimulus and the response. For us, a good representation is one where the informative variability is pre- 
served, and the non-informative variability is discarded. As an example, we analyse two different situations 
(Figure©. In panel A, we use simulated data, where we know exactly how the neural code is structured. We 
can therefore compare the performance of the spike representation, with two pattern representations: one 
of them intentionally tailored to capture the true neural code that generated the data, and another represen- 
tation discarding some informative variability. The neural response consists of a sequence of four different 
patterns, associated with each of four stimulus features, in the presence of temporal jitter and categorical 
noise (see Isection 2.4.11 Simulation 1). In panel B, we study experimental data (see Isection 2.51) . so the 
neural code is unknown. Therefore, in this case we compare the spike representation with two candidate 
pattern representations, ignoring a-priori which is the most suitable. 

For both simulation and experimental data, we estimated the information conveyed by the spike rep- 
resentation R; a pattern representation B a , where all bursts are grouped into categories according to their 
intra-burst spike count; and a second pattern representation B^, with only two categories comprising isolated 
spikes and complex patterns. This is shown in Figure 0], where the information per unit time is plotted as 
a function of the window size used to read the neural response. The representations are related through 
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functions, in such a way that Bl 3 is a transformation of B", which is in turn a transformation of R. Therefore, 
/(S; B^) < /(S; B a ) < /(S; R), for all finite response windows (see Eq. [6]). Nevertheless, notice that B^ may 
be a faster-to-read code than B a , since the latter requires a time window long enough to distinguish not 
only the differences between isolated spikes and bursts, but also the differences among bursts of different 
categories. 
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Figure 4: Information per unit time transmitted by different choices of patterns. The spike representa- 
tion (R) is transformed into a sequence of patterns grouped in categories according to the intra-pattern spike 
count (B ff ), which is further transformed into a sequence of patterns classified as isolated spikes or complex 
patterns (B^). Comparing the amount of information transmitted gives insight about the relevance of the 
structures preserved in the representations. (A) Simulation of a neural response with four different patterns, 
elicited by a stimulus with four different features, in presence of temporal jitter and categorical noise (see 
Isection 2.4.1I Simulation 1 for details). (B) Experimental data from a grasshopper auditory receptor neuron. 
In all cases, error bars < 1% (smaller than the size of the data points). 



In the simulation (Figure|4j\), the information carried by B" is equal to the spike information (I Sim (S; R) = 
I Sim (S;B a ) = 254.2 + 0.2 bits /s, one-sided t-test, p(10) = 0.5). This is expected since, by construction, the 
neural code used in the simulations is, indeed, B a . Therefore, in this case, B a is a lossless representation. 
The choice of an adequate representation is more difficult in the experimental example (Figure |4j3), where 
the neural code is not known beforehand. In this case, B* preserves less information than the spike sequence 
(I Exp (S; R) = 133 + 4 bits/s, I Exp (S; B a ) = 121 +3 bits/s, one-sided t-test, p(\0) = 0.004). The information 
7(S;B a ) represents 91 % of the spike information. In general, whether this amount of information is 
acceptable or not depends on whether the loss is compen sated by the advantages of attaining a reduced 
representation of the response (|Nelken and ChechikL 120071) . 



Distinguishing only between isolated spikes and bursts (B^) diminishes the information considerably in 
both examples (one-sided t-test, p(\0) < 0.001, both cases). In the simulation, the information carried by 
B^ is /5„„(S; B^) = 208.7 + 0.6 bits/s, representing about 82.1 % of the spike information. This is expected 
since, by construction, different stimulus features are encoded by different patterns. For the experimental 
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data, /fijtpCS; B 3 ) = 91+7 bits/s, representing about 68 % of the spike information. In both examples, the 
representation B a is "more sufficient" than B^. The difference /(S; B a ) - /(S; B^) constitutes a quantitative 
measure of the role of distinguishing between bursts of 2, 3, . . . , n spikes, provided that the distinction 
between isolated spikes and bursts has already been made (7(S; B a |B^)). However, B a still preserves other 
response aspects, such as pattern timing, number of patterns, etc. In what follows, we study the role of 
different response aspects in information transmission. 

3.3. Informative aspects of the neural response 

The pattern representation may preserve one or several aspects of the neural response that could, in princi- 
ple, encode information about the stimulus. More specifically, if the response is analysed using windows of 
duration r, there are several candidate response aspects that might be informative, namely: 

a- the number of patterns in the window (number of events - Figure |5K) 

b- the precise timing of each pattern in the window (time representation - Figure HJ)) 

c- the pattern categories present in the window with no specification of their ordering (response set of 
categories - Figure 0B) 

d- the temporally-ordered pattern categories in the window (category representation - Figure CLE)- 



A 
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Number 
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|0|3|0|0|0|0|0|1|0|0|1|0|0|0|0|2|0|0 
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Figure 5: Identifying the information carriers in the neural response. (A) The representation rj of the 
neural response is obtained by transforming the pattern sequence such that only the number of events is 
preserved. (B) By transforming the pattern sequence into the representation 0, the information about the 
categories present in the neural response is preserved, while their order of occurrence is disregarded. 

We find that these aspects are related through deterministic functions. Indeed, aspect[a]can be univocally 
determined from aspects |Z?l|c]or|d| Thus, the information transmitted by aspect|7gis also carried by any of the 
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other aspects. In the same manner, aspect \c\ can be determined from[7i| However, in | Appendix B| we prove 
that the number of patterns in the window (aspect[Z7j) makes a vanishing contribution to the information rate. 
That is, although aspect \a\ might be informative for a finite window of length r, its contribution becomes 
negligible in the limit of long windows. Surprisingly, the unordered set of pattern categories (aspect |cj) also 



makes no contribution to the information rate, as shown in Appendix C Even more, the entropy rates of 
both aspects tend to zero in the limit of long time windows. Therefore, their information rate with respect 
to any other aspect, of either the stimulus and/or the neural response, vanishes as the window size increases. 
We thus do not discuss aspects \a\ and |c] any further. 

This is not the case of response aspects \b\ and In other words, they may sometimes be informative; 
their definitions do not constrain them to be non-informative. Therefore, in what follows, we transform the 
pattern representation into two other representations preserving the precise timing of each pattern (the time 
representation) and the temporally-ordered pattern categories (the category representation). Our goal is to 
determine in which way the precise timing of each pattern conveys information about the time positions of 
stimulus features (the when), and how the temporally ordered pattern categories provide information about 
the identity of the stimulus features (the what). 

3.4. Time and category information 

We define the time information /(S; T) as the mutual information rate between the stimulus S and the time 
representation T. In addition, we define the category information /(S; C) as the mutual information rate 
between the stimulus S and the categ ory representation C. The ca t egory information is nov el and, unlike 



and complementing previous studies (|Gaudry and ReinagelL 12.0081 : Eyherabide et all 120091) . allows us to 



address the relevance of pattern categories in the neural code (see Isection 3 .51) . Since both T and C are 
transformations of the pattern representation B (see Eqs. [D, the time and category information cannot be 
greater than the pattern information, i.e. 



m t) 

/(S; C) 



</(S;B). (7) 



When T and C are read out simultaneously, the pair (T, C) carries the same information as the pattern 
sequence B (/(T, C; S) = /(B; S)). In fact, B and the pair (T, C) are related through a bijective function. To 
prove this, consider any pattern representation B, of a neural response U, . The pair (T ; , C,) associated with 
U, is a function of B, (see Eqs.Q]). Conversely, given the pair (T,, C,) associated with U,, all the information 
about the time positions and categories of patterns present in U, is available, and thus B, is univocally 
determined. Notice that the pairs (T, C) are a subset of the Cartesian product TxC. 

The time positions of patterns may depend on their categories, and vice versa. To explore this relation- 
ship, and how it affects the transmitted information, we separate the pattern information as 
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/(B;S) = /(S;T) + /(S;C) + A Si? ; 



(8) 



where A SR represents the synergy/redundancy between the time and the category representations, defined 
by 



A Sfi = -/(S;T;C) 



(9) 



Here, I(X; Y; Z) = I(X; Y)-I(X; Y\Z) is called triple mutual information (ICover and ThomasLll99ll : lTsujishita , 
19951) . If A SR is positive, time and category information are synergistic: more information is available when 
T and C are read out simultaneously. Conversely, if A SR is negative, time and category information are 
redundant. The proof of Eq. [8] and Eq. [9] is shown in |Appendix D Previous s tudies have already defined 
the synergy/redundancy for populations of neurons (ISchneidman et all 120031) . It has also been applied 
to single neurons, to determine how different aspect s of response patter ns encode the identity of single 



stimulus features (Furukawa and Middlebrooks 



20021 : iNelken etal. . 



2005b . Here we extend the concept to 



encompass also dynamic stimuli where stimulus features arrive at random times, as well as for arbitrary 
patterns, defined in time and/or across neurons. 

As an example, consider the data presented in Figure H when the neural responses represented as a 
sequence of bursts (B a ). For the case of the simulations (Figure |U A), the time information is I Sim (S,T a ) = 
180.4 + 0.2 bits Is, and the category information, I Sim (S, C a ) = 74.2 + 0.5 bits Is. The synergy/redundancy 
term is slightly negative, but not significant (A^™ = -0.4 + 0.5, two-sided t-test, p{\5) = 0.44). By 
construction, in the simulation the time and category information are neither redundant nor synergistic. For 
the experimental data (Figure HB), I Exp (S, T a ) = 63 + 2 bits Is and I Exp (S, C a ) = 50.6 + 0.6 bits Is. In this 
case, we don't know whether the time information and the category information are redundant or synergistic 
before-hand. Yet, by comparing them with the pattern information we obtain Af x R p = 7 + 3bits/s, indicating 
that timings and categories of patterns are slightly synergistic (two-sided t-test, p(l5) = 0.063). 



The pattern, time and category information depend on the choice of the alphabet of patterns. For exam- 
ple, the category i nformation may increase or decrease depending on the nature of the aspect defining the 



pattern categories (IFurukawa and Middlebrooks . 



2002 



Gollisch and Meisteil 120081) . No general rules can 



be given, predicting these changes: they depend on the neural representation at hand. However, when the 
alternative pattern representations are linked through functions, some relations between their variations can 
be predicted, without numerical calculations. Compare, for instance, B ff and B^ 3 as defined in lsection 3.11 
By grouping all bursts with more than one spike into a single category, not only B^ is a function h a ^>p 
of B a (R0 = h a ^{W)), but also 0* = h a ^(C a ). The time representation remains intact (T 8 = T a ). 
As a result, neither the pattern information nor the category information can increase, whereas the time 
information remains constant. In addition, if T a and C a are independent and conditionally independent 
given the stimulus, so are and O 3 . Therefore, the difference in the category information equals the 
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difference in the pattern information (/(S; C a ) - 7(S; &) = 7(S; B a ) - 7(S; B% 



Analogously, consider a representation B r in which the time positions of patterns identified in B a are 
read out with lower precision (2 At). Since B y is a function of B a , two different responses B a and B a that 
only differ little in the pattern time positions are indistinguishable in the representation B y (B? = Bj). In 
this case, the comparison between B a and B 7 is analogous to the case analysed in the previous paragraph, 
with the role of the time and category representations interchanged. 

We illustrate these results with an example. In Figure [6l the pattern, time and category information 
are shown for three different choices of the pattern representation. The simulated neural response is taken 
from Figure |4]A In the three cases, there is no synergy or redundancy between the time and the category 
information (A SR = 0). From Figure UK, we already know that 7(S;B 3 ) < 7(S;B ')- Comparing the left 
and middle panels of Figure |6l we find that this reduction is due to a decrement in the category information 
(7(S,C ff ) = 74.2 + 0.5 bits Is, 7(S,D 3 ) = 28.6 ± 0.3 bits/ s, one-sided t-test, p(10) < 0.001), as expected 
(see Isection 3TTI) . In agreement with the theoretical prediction, the time information remains unchanged 
(7(S, T a ) = 7(S, T?) = 180.4 ± 0.2 bits Is, one-sided t-test, p(10) = 0.5). 




B a B p B T 



Pattern information Category information 

Time information 



Figure 6: Pattern, time and category information carried by different neural representations. The 

spike representation is transformed into a sequence of patterns: B a (left): grouped in categories according 
to the intra-pattern spike count; B^ (middle): classified as isolated spikes or complex patterns; and B r 
(right): classified as in B a , reading out the time positions with a lower precision (2 At). In all cases, error 
bars < 1%. The simulation data is taken from Figure 141 (see Isection 2.4.11 Simulation 1 for details). 

Analogously, compare the left and right panels of Figure [61 In this case, both the pattern and time 
information decrease (I(S,B a ) = 254.2 + 0.2bitsfs, 7(S,B 7 ) = 230.1 ± 0.1 bits/ s, I(S,T a ) = 180.4 + 
0.2 bits Is, 7(S, T r ) = 156.0 + 0.2 bits Is, in both cases, one-sided t-test, p(10) < 0.001), while the category 
information remains unchanged (7(S, C a ) = 7(S, C r ) = 74.2 + 0.5 bits Is, one-sided t-test, p(\0) = 0.5). 
Thus, as mentioned previously, a reduction in the precision with which the patterns are read out always 
decreases the time information, while keeping the category information constant. 



In other examples, the variations in the time and category information may not be directly accompanied 
by variations in th e pattern information, due to the presence of synergy and redundancy. For example, 
Alitto et al. ( 20051) studied the encoding properties of tonic spikes, long-ISI tonic spikes (tonic spikes 
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preceded by long ISIs) and bursts. To evaluate the relevance of distinguishing between tonic spikes and 
long-ISI tonic spikes, one can compare the information conveyed by two representations: B^, preserving 
the dif ference between tonic spikes and long-ISI tonic spikes, and B^, grouping them into the same cat- 
egory iGaudry and Reinagel . 2008 ). Both B^ and B^ only differ in the category representation, like B a 



and B^. However, unlike those representations, A| s and A^ R need not be either equal or zero, and thus 
[l(S; B^) - /(S; B<*)] = [/(S; 0) - I(S; C^)] + [a| r - Aj J. Indeed, by reading simultaneously the timing 
and category of a pattern, the uncertainty on whether the following pattern will be a long-ISI tonic spike 
is reduced. Hence, this reduction is a source of redundancy in B^, where the long-ISI tonic spikes are 
explicitly identified. On the other hand, the interpattern time interval (IPI) preceding a long-ISI tonic spike 
may reveal the duration of the previous pattern. Any information contained in it constitutes a source of 
synergy in B^. The distinction between tonic spikes and bursts produces analogous effects on the synergy 
and redundancy, affecting both representations B^ and B^. 



As shown in 



Cover and Thomas 



(|199ll) . I(S; T; C) is symmetric in S, T and C. Hence, A SR is upper and 



lower bounded by 



I(X; Y) < A SR < I(X; Y\Z) ; 



(10) 



where X, Y and Z represent the variables S, T and C in such an ordering that I(X; Y) = min{/(T; C), /(S; T), /(S; C)} 



(see proof in Appendix E). The same ordering applies for both bounds, in such a way that, for example, if 
I(S; T|C) is the least upper-bound, then /(S; T) is the greatest lower bound, fro m the set of bounds derive d 
in Eq. [TO] These bounds are novel, tighter than the bounds previously mentioned 



Schneidman et al 



(2003) 



If the left side of Eq. [10] is zero, time and category information are non-redundant (A SR > 0). However, 
they may still be synergistic (0 < A SR ), even in the case when they are both zero (7(S: T) = I(S; C) 



Foffani et al. 



2009 ). Time and 



=> A SR > 0). This property has often been overlooked (see, for example, 
category information are non- synergistic if and only if the right side of Eq. [10] is zero. From the definition 
of the synergy/redundancy A SR (Eq.|9]), we show that 



A SR = O I(X; Y) = I(X; Y\Z) ; 



(ID 



where X, Y and Z represent the variables S, T and C in any order. In this case, the time and category 
information add up to the patt ern information. This situation may occur when either I(X; Y) = I(X; Y\Z) = 



or I(X; Y) = I(X; Y\Z) > dSchneidman et al 



2003 



Nirenberg and LathamL |2003[) 
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3.5. Relevance and sufficiency of different aspects of the neural response 



Previous studies have addressed the relevance of pattern timing in inform ation transmission by quanti 



fying the time information and comparing it with th e pattern information (Penning and Reinagel . 



Gaudry and Reinagel . 



2008 



2005 



Eyherabide et all 120091) . In other words, the relevance of pattern timing is 



given by the amount of information carried by a representation that only preserves the time positions of 
patterns. We call this paradigm criterium I. Indeed, one can also address the relevance of pattern categories 
using criterium I. However, instead of quantifying the amount of information carried by the category repre- 
sentation, these previous works have determined the information loss due to ignoring the pattern categories. 
Here, this point of view is called criterium II. In what follows, we prove that criterium I and criterium II 
take into account different information, and can thus lead to opposite results when both of them are applied 
to the same aspect of the response. 



Formally, under criterium I, the pattern timing is relevant (or sufficient) for information transmission if 



7(S;B)-A7i </(S;T). 



(12) 



Cover and Thomas 



( 19911) have defined suffi- 



Here, ML represents a previously set threshold. Although 
ciency on ly for the case when AI ! h = 0, in practice, some amount of information loss (ML > 0) is usually 
accepted (INelken and Chechikl 120071) . We can also employ this criterium to address the relevance of pattern 



categories, comparing 



/(S;B)-A/ f / / , </(S;C). (13) 
On the other hand, under criterium II, the pattern categories are relevant to information transmission if 



/(S;T)</(S;B)-A/4\ (14) 

Therefore, pattern categories are relevant if pattern timings transmit little information, irrespective of the 
information carried by categories themselves. Remarkably, if ML = M 1 !, the pattern categories are relevant 
(irrelevant) if and only if the pattern timings are irrelevant (relevant) (compare Eqs.[T2land[T4l). 

From the bijectivity between B and (T; C) (see lsection 3.41) . we find that criterium II can be written as 



A/£-A SR </(S;C). (15) 
As a result, under criterium II, the relevance of an aspect depends not only on the information conveyed 
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by that very aspect — as in criterium I — but also on the synergy/redundancy between that aspect and the 
complementary ones. Both criteria coincide when All + AI 1 ! = 7(S;B) + A SR (compare Eqs.[T3land[T5T). 
implying that equality in the thresholds is neither necessary nor sufficient to obtain a coincidence. 

By using criterium I for the relevance of pattern timing and criterium II for the relevance of pattern 
categories, the information that is repeated in both aspects (redundant information) only contributes to the 
relevance of the pattern timing. However, the information that is carried in both aspects simultaneously 
(synergistic information) only contributes to the relevance of the pattern categories. The discrepancies in 
this way induced are shown in the following example. Consider that /(S; R) = 10 bits/s, I(S; T) = 9 bits Is, 
7(S;C) = 10 bits/s and AI th = 2 bits/s. Under criterium II, C is irrelevant because 7(S;B) - I(S; T) = 
1 bit/ s < AI th . Nevertheless, under criterium I, C is necessarily relevant, since it constitutes a sufficient 
statistics (/(S; B) = /(S; C)). Analogous results are obtained for the relevance of pattern timing. In addition, 
different thresholds are used for the relevance of each aspect (compare Eqs. [T2]and[T3]). In the previous 
example, the pattern timing is relevant only if I(S; T) > 8 bits/s whereas the pattern categories are relevant 
only if /(S; C) > 2 bits/s, showing an unjustified asymmetry between both aspects. 



3.6. Time and category entropy of the stimulus 



Many studies have interpreted that pattern-based codes function as feature extractors, where the identity of 
each stimulus feature (the what) is represented in the pattern category C, and the timing of each stimulus 
feature (the when), in the pattern temporal reference T (see Introduction and references therein). To assess 
this standard view, we formally define the what and the when in the stimulus, and relate them with the 
time and category information. In the next subsection, we determine the conditions that are necessary 
and sufficient for the standard view to hold. Finally, we show that small category-dependent changes in 
the timing of patterns (such as latencies) may induce departures from the standard view (altering both the 
amount and the composition of the information carried by T and C). 

Since the stimulus S is composed of discrete features (see IMethodsl for a discussion on continuous 
stimuli), it can also be written in terms of a time (St) and a category (Sc) representation, such that S and 
the pair (St,Sc) are related through a bijective map. We formally define the what in the stimulus as the 
category representation Sc, and the when as the time representation S T . Indeed, S T indicates when the 
stimulus features occurred, whereas Sc tags what features appeared. 



The stimulus entropy is defined as the entropy rate H(S), while the stimulus time entropy and category 
entropy are the entropy rates //(S T ) and H(S C ), respectively. The time and category entropies are intimately 
related to when and what features happened: they are a measure of the variability in the time positions and 
ca tegories of stimulus features, res pectively. These quantities were previously defined for Poisson stimuli 



in 



Eyherabide and Samengol (|2010h . and here these definitions are generalised to encompass any stochastic 



stimulus. Since S and (S T , S c ) are related through a bijective function, 
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ff(S) = ff(S T ) + f/(Sc)-/(S T ,S c ); 



(16) 



where the information rate /(St, Sc) is a measure of the redundancy between the time and category entropies 
of the stimulus. Since /(S T , Sc) is always non-negative, S T and Sc cannot be synergistic. 

The standard view of the role of patterns formally implies that the category information /(S, C) (the 
time information /(S, T)) can be reduced to the mutual information /(Sc, C) (/(St, T)). Therefore, //(Sc) 
and //(St) must be upper bounds for the category and time information, respectively. However, these 
bounds are not guaranteed by the mere presence of patterns in the neural response. Some cases may be 
more complicated because, for example, Sc and St may not be independent variables (see lsection 2.31) . A 
dependency between these two stimulus properties implies that the what and the when are not separable 
concepts. 



3.7. The canonical feature extractor 



In this section, we determine the conditions under which the standard interpretation holds: The category 
information represents the knowledge on the what in the stimulus, and the time information, the knowledge 
on the when. To that aim, we define a canonical feature extractor as a neuron model in which 



/(T;S c |S T ) = (17a) 
/(C;S T |S c ) = 0. (17b) 



Under each of these conditions, the time and category information become 



/(T;S) = /(T;St)<//(S t ) (18a) 
/(C;S) = /(C;Sc)<//(S c ). (18b) 

Consequently, the response pattern categories represent what stimulus features are encoded, whereas the 
pattern time positions represent when the stimulus features occur. In particular, the time and category 
information are upper bounded by the stimulus time and category entropies, respectively. 

Condition 1 1 7al implies that all the information /(Sc; T) is already contained in the information /(St; T). 
In other words, /(S c ; T) is completely redundant with /(S T ; T), and /(S c ; T) < /(S T ; T). In this sense, we 
say that the time information represents the when in the stimulus. Analogous implications can be obtained 
from condition ll7bl for the category representation C, by interchanging T with C, and S T with S c (see formal 
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proof in |Appendix F[ ). Therefore, conditions [T7] are necessary and sufficient to ensure that the standard view 
of the role of patterns in the neural code actually holds (see lsection 4.11) . 



A canonical feature extractor does not require T and C to be independent nor conditionally independent 
given the stimulus. In other words, the time and category information may or may not be synergistic 
or redundant, and the timing (category) of each individual pattern may or may not be correlated with 
other pattern time positions (pattern categories) or even with pattern categories (pattern time positions). 
In addition, conditions [17] may also encompass situations in which some information about Sc (St) is 
carried by T (C), but not by C (T). 

In order to see how synergy and redundancy behave in a canonical feature extractor, we replace EqsJT8l 
in Eq. [8J and obtain 



/(B;S) = /(T;St)+/(C;Sc) + A™. (19) 
We find that, for a canonical feature extractor, the synergy/redundancy A SR is lower bounded by 



- /(S T ; S c ) < A SR 



(20) 



(see proof in Appendix G). In other words, the synergy/redundancy term A SR cannot be smaller than the 
redundancy — already present in the stimulus — between the timing and categories of stimulus features. 
In addition, the absence of redundancy in the stimulus (/(St;Sc) = 0) constrains the neural model to be 
non-redundant (A SR > 0). 



Consider a neural model in which T = /(St; fir) an d T = /(Sc; <Ac)> where if/ T and if/ c are independent 
sources of noise, such that p(ifr T , iftr, S T , S r ) = p(i// T ) p(^c) p(St, S r .)- Thus, T and C are two channels of 



information under independent noise (IShannon 



1948 



Cover and Thomas . 



1991). This model constitutes a 



canonical feature extractor. Indeed, T (C) is only related to S c (S T ) through S T (Sc), thus complying with 
condition ll7al ( U7b1). In addition, if S T and Sr are independent, then T and C constitute independent chan- 



nels of information ( Cover and ThomasL 



1991 



Gawne and RichmondLll993|) . This model plays a prominent 



role in the interpretation of neurons and neural pathways as channels of informa tion (|Gawne and Richmond , 



1993tlSchneidman et al. . 



2003 



Montemurro et al. 



2008 



Krieghoff et al. . 



2009b . as discussed in lsection 4.51 



The independent channels of information may be regarded as the simplest canonical feature extractor. 
Since T and C are independent and conditionally independent given S, the time and category information 
add up to the pattern information (A SR = 0). An example of this model is shown in Figure UJ In the four 
simulations carried out, the neural responses consist of a sequence of four different patterns, associated 
with four different stimulus features, under the presence or absence of temporal jitter and categorical noise 
(see lsection 2.4. II Simulation 2 for a detailed description; Figure 0B shows examples of the different noise 
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conditions). In Figure |71 the spike information is omitted because it coincides with the pattern information 
(all cases, one-sided t-test, p(lO) = 0.5). Indeed, by construction, all the information is transmitted by 
patterns, which can be univocally identified in the response. In agreement with the theoretical results (Eq. 
[T8l) . the time and the category information are always upper-bounded by the stimulus time and category 
entropy, respectively (all cases, one-sided t-test, p(\0) > 0.4). 



Without categorical noise With categorical noise 




Stimulus entropy Pattern information 

144443 Stimulus time entropy Time information 

144443 Stimulus category entropy Category information 



Figure 7: Information transmitted by a canonical feature extractor under different noise conditions. 

The left side of each panel shows the stimulus entropy, whereas the right side shows the pattern, time and 
category information. In all cases, A SR = 0, so the pattern information is equal to the sum of the category 
and the time information. From left to right: The addition of categorical noise reduces only the category 
information irrespective of the amount of temporal jitter. From top to bottom: The presence of temporal 
jitter degrades solely the time information irrespective of the amount of categorical noise. The pattern 
information is upper bounded by the stimulus entropy, the time information by the stimulus time entropy, 
and the category information by the stimulus category entropy. In all cases, error bars < 1%. For detailed 
description of the simulation see lsection 2.4. II Simulation 2. 

Comparing upper and lower panels of Figure [71 we show that the time information is degraded by the 
addition of temporal jitter (both cases, one-sided t-test, p(l0) < 0.001), while the category information 
remains constant (both cases, one-sided t-test, p(\0) > 0.14). Analogously, comparing left and right panels 
of Figure |7l we find that the addition of categorical noise decreases the category information (both cases, 
one-sided t-test, p(10) < 0.001), while keeping the time information constant (panel A and B, /(S; T A ) = 
223.3 + 0.1 bits Is, /(S; T s ) = 222.8 + 0.1 bits Is, one-sided t-test, p(10) = 0.08; panel C and D, one-sided 
t-test, p(10) = 0.5). This is expected since, by construction, the categorical noise only depends on the 
stimulus categories and affects solely the pattern categories, whereas the temporal jitter considered here 
only affects the pattern time positions, irrespective of their categories or the stimulus. 
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3.8. Departures from the canonical feature extractor 



The example shown in Figure[7]turn s out to be more complicated if the pattern timing depends on the pattern 
category, as occurs in latency code s ( Gawne et al. . 1996 : Furukawa and Middlebrooks . 12002 ; Chase and Young , 



2007 



Gollisch and Meistei . 



20081). Indeed, in those cases, the comparison between the timing of response 



patterns and the timing of stimulus features carries information about the stimulus categories (/(Sc; T|S T ) > 
0). As a result, Eq. ll7al does not hold. Latency codes may be an intrinsic property of the en coding neuron, 



1997 



Reinagel et all 119991) . or may either 



may result as a consequence of synaptic transmission (|Lismanl . 
arise from the convention used to construct the pattern representation, for example, as cribing the timing of 
a pattern as the mean r esponse time, the first or any other spike inside the pattern ( Nelken et al. . 20051 : 



Eyherabide et all |2008|) . In all these cases, a latency-like dependence between the time positions and 



categories of patterns may arise. 



To assess the effect of different latencies associated with each pattern category on the neural response, 
consider the neural model used in Figure 13 except that now, the pattern latencies vary with the pattern 
category b, according to = 1 + * (4 - b). Here a M is the latency index, representing the difference 
between the latencies of consecutive pattern categories. Three values of were considered: 0, 2 and 4 ms. 
When = ms, all patterns have the same latencies. This case was analysed in Figure [7] As increases, 
so does the latency difference of different patterns. 

Due to the deterministic link between the pattern latencies and pattern categories, the pattern represen- 
tations (B°, B 2 and B 4 ), associated with the different values of a M are related bijectively. In addition, the 
category representation does not depend on a M . Only the time representation is altered by a change in the 
latency index, irrespective of the presence of absence of temporal jitter and categorical noise. Therefore, 
any change in the time information is immediately reflected in the synergy/redundancy term 



A^ = /(S;T°)-/(S;r). (21a) 
= - [H(T X ) - H(T Q )] + [H(T\S) - H(T°\S)] . (21b) 

Here, A£ R and T T represent the synergy/redundancy term and the time representation, respectively, for 
a M = xms. 

The impact of different latencies is twofold. In the first place, the presence of categorical noise incre- 
ments the temporal noise through the deterministic link between latencies and categories. Therefore, the 
time noise entropy (time information) when > is greater (less) than that when = 0. However, this 
does not occur when the time and category representations are read out simultaneously. Indeed, given the 
category representation, any time representation for a )1 = x> can be univocally determined from the time 
representation for = 0, and vice versa, counteracting the effect of the temporal noise. Therefore, the 
variation in the time noise entropy (//(T V |S) - i/(T°|S) in Eq.[2D can be regarded as a source of synergy. 



25 



In the second place, the variation in the latencies modifies the inter-pattern time interval distribution, 
incrementing the time total entropy (and the time information) when a h > with respect to the case when 

= 0. In addition, this variation introduces information about the pattern categories in the inter-pattern 
time interval, and consequently it also introduces information about the stimulus identities. For example, 
a short interval between two consecutive patterns indicates that the second patterns belongs to a category 
with a short latency. In consequence, the increment in the time total entropy (H(T X ) - H(T°) in Eq.l2TI) can 
be regarded as a source of redundancy. 

To illustrate these theoretical inferences, the results of the simulations are shown in Figure [SJ As 
expected, when a M = x > 0, the latencies alter the time information. However, they do not alter the 
pattern nor the category information, and thus any variation in the time information is compensated by an 
opposite variation in the synergy/redundancy term. Notice that the changes in the time information not only 
depend on the latency index, but also on the presence of temporal and categorical noise. Indeed, in the 
absence of categorical noise, //(T r |S) = i/(T°|S) = 0, and thus A SR < 0. The effect of the temporal jitter 
depends on its distribution as well as the distribution of the inter-pattern time intervals, so this analysis if 
left for future work. 
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Figure 8: Examples of departures from the behaviour of the canonical feature extractor: The effect 
of pattern-category dependent latencies. In all cases, when latencies depend on the pattern category, the 
time information is affected while the category information remains unchanged. Furthermore, the addition 
of categorical noise not only affects the category information but also the time information. In general, how 
the addition of temporal and/or categorical noise affects the time information depends on the latency index, 
as well as on the noise already present in the response. For simulation details, see lsection 2.4. II Simulation 
2. The case where = ms was analysed in Figure |7] and is reproduced here for comparison. In panel A, 
the case where a M = ms also represents the stimulus entropies, as shown in Figure |7jA In all cases, error 
bars < 1%. 
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In these examples we see that for non-canonical feature extractors, one can no longer say that the 
pattern categories represent the what in the stimulus and the pattern timings represent the when, not even 
in the absence of synergy/redundancy. As shown in Eq. [2TJ A SR results from a complex tradeoff between 
the effect of categorical noise on the total and noise time response entropies. This tradeoff depends on the 
latency index and the amount of temporal noise in the system, as shown in Figure [8j 



Latency-li ke effects may be involved in a translation from a pattern-duration code into an inter-spike 



interval code (IReich et al. 



2000; 



Denning and Reinagell 120051) . Indeed, bursts may increase the reliability 



of synaptic transmission (|LismanUl997l) . making it more probable to occur at the end of the burst. In that 
case, the duration of the burst determines the latency of the postsynaptic firing. In particular, this indicates 
that bursts can be simultaneously invol ved in noise filtering and st imulus encoding, in spite of the belief 
that these two functions cannot coexist (IKrahe and Gabbianil 120041) . Notice that here, latency codes have 
been studied for well-separated st imuli. However, if p atterns are elicited close enough in time, they may 
interfere in a diversity of manners (IFellous et alll2004h . precluding the code from being read out. Although 
we cannot address all these cases in all generality, the framework proposed here is valid to address each 
particular case. 



4. Discussion 



In this paper, we have focused on the analysis of temporal and categorical aspects, both in the stimulus 
and the response. Our results, however, are also applicable to other aspects. In the case of responses, 
these aspects can be latencies, spike counts, spike timing variability, autocorrelations, etc. Examples of 
stimulus aspects are colour, contrast, orientation, shape, pitch, position, etc. The only requirement is that 
the considered aspects be obtained as transformations of the original representation, as defined in lsection 2.11 
(see lsection 4.21) . The information transmitted by generic aspects can be analysed by replacing B (S) with a 
vector representing the selected response (stimulus) aspects. The amount of synergy/redundancy between 
aspects is obtained from the comparison between the simultaneous and individual readings of the aspects. 
In addition, the results can be generalised for aspects defined as statistical (that is, non-deterministic) 
transformati ons of the neural respons e, or of the stimulus. The data processing inequality also holds in 
those cases ( Cover and ThomasUl99ll) . 



4.1. Meaning of time and category information and their relation with the what and the when in the 
stimulus 



In this paper, we defined the category and the time information in terms of properties of the neural response. 
The category (time) information is the mutual information between the whole stimulus S and the categories 
C (timing T) of response patterns (see Figure |9jA)- These definitions only require the neural response to be 
structured in patterns. No requirement is imposed on the stimulus, i.e. the stimulus need not be divided 
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into features. Our definitions, hence, are not symmetric in the stimulus and the response. In some cases, 
however, the stimulus is indeed structured as a sequence of features. One may ask how the stimulus identity 
(the what) and timing (the when) is encoded in the neural response (see Figure03). To that end, we defined 
the what in the stimulus in terms of the category representation (Sc), and the when, in terms of the time 
representation (St). 
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Figure 9: Analysis of the role of spike patterns: Relationship with the what and the when in the 
stimulus. (A) Categorical and temporal aspects in the neural response. Definitions of time /(S; T) and 
category /(S; C) information. (B) Categorical and temporal aspects in the stimulus. Information about the 
what /(Sc; B) and the when /(St; B) conveyed by the neural response B. (C) Analysis of the role of patterns 
in the neural response. Mutual information between different aspects of the stimulus and different aspects 
of the neural response. 



These rigorous definitions allowed us to disentangle how the what and the when in the stimulus are en- 
coded in the category and time representations of the neural response. We calculated the mutual information 
rates between different aspects of the stimulus and different aspects of the neural response (see Figure |9f ). 



of patterns, the when ( 


Theunissen and Miller. 


1995 


Borst and Theunissen. 


1999; 


Martinez-Conde et al.. 


2002; 


Krahe and Gabbiani. 


2004; 


Alitto et al.. 


2005; 


Oswald et al.. 


2007; 


Evherabide et al. . 


2008 


). These 



assumptions have been stated in qualitative terms. There are two different ways in which the standard view 
can be formalized as a precise assertion. 



On one hand, the standard view can be seen as the assumption that the category (time) representation 
only conveys information about the what (the when). Evaluating this assumption involves the comparison 
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between the information conveyed by the category (time) representation about the whole stimulus (dotted 
lines in Figure [9f ) with the information that this same representation conveys about the what (the when) in 
the stimulus (solid lines in Figure|9f ). Formally, this means to address whether I(S; C) = I(Sc', C) (whether 
/(S; T) = /(St; T)). In this sense, we say that the category (time) information only represents the what (the 
when) in the stimulus. A system complying with this first interpretation of the standard view was called a 
canonical feature extractor (see lsection 3.71) . 



On the other hand, the second way to define the standard view rigorously is to assume that the what 
(the when) is completely encoded by the category (time) representation. Testing this second assumption 
involves the comparison between the information about the what (the when), conveyed by the category 
(time) representation (solid lines in Figure &2) and by the pattern representation of the neural response 
(dashed lines in Figure[9]C). Formally, it involves assessing whether 7(S C ; B) = 7(S C ; C) (whether 7(S T ; B) = 
/(St; T)). In this sense, we say that all the information about the what (the when) in the stimulus is encoded 
in the category (time) representation of the neural response. A system for which these equalities hold is 
called a canonical feature interpreter. It is analogous to the canonical feature extractor, with the role of the 



stimulus and the response interchanged (see Appendix H). 



The two formalizations of the standard view are complementary. The first one assesses how different 
aspects of the stimulus are encoded in each aspect on the neural response. The second one focuses on 
how each aspect of the stimulus is encoded in different aspects of the neural response. Thus, the second 
approach is a symmetric version of the first one. However, a canonical feature extractor might or might not 
be a canonical feature interpreter, and vice versa. A perfect correspondence between the what and the when 
on one side, and pattern timing and categories, on the other, is found for systems that are canonical feature 
extractors and canonical feature interpreters, simultaneously. 



4.2. Two different approaches to the analysis of neural codes 



In order to understand a neural code, one needs to identify those aspects of the neural response that are 
relevant to information transmission. To that aim, two different paradigms have been used: criterium 
I, assessing the information that one aspect conveys about the stimulus, and criterium II, assessing the 
information loss due to ignoring that as pect (see Isection 3.51). Previous studies have used crite rium I 
to analys e the relevance of spike counts (IFurukawa and MiddlebrooksL 12002c iFo ffani et all 120090 , spike 



patterns ( Reinagel e t all 1 19991: LEvherabide et 



Gaudry and Reinagel . 



2008 



al 



Evherabide et al. 



2005 



200' 



2008), and pattern timing (|Denning and Reinagel . 
9J). However, when assessing the relevance of the com 



plementary aspects, such as spike timing and internal structure of patterns, these studies have used criterium 
II. As a result, in these studies the relevance of the tested aspect is conditioned to the irrelevance of the other 
aspects. 



There are cases where building a representation that preserves a definite response aspect is not evident 
(nor perhaps possible). Such is the case, for example, when assessing the differential roles of spike timing 
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and spike count: It is not possible to build a representation preserving the timing of the spikes without 
preserving the spike count (see lsection 3.31) . It is instead possible to only preserve the spike count. Since 
the spike-count representation is a function of the spike-timing representation, one may argue that there 
is an intrinsic hierarchy between the two aspects. The same situation is encountered when evaluating the 
information encoded by the pattern representation, as compared to the spike representation (see lsection 3.11) . 
There, it was not possible to construct a representation only containing those aspects that had been discarded 
in the pattern representation. However, this is not the case when evaluating the differential role between 
pattern timing and pattern categories, or the relevance of a specific pattern category. 

In the present study, we take advantage of both approaches. Firstly, we notice that pattern timing and 
pattern categories are complementary response aspects, and quantify the information preserved by each 
aspect (see lsection 3.41) . Then, we determine whether there is synergy or redundancy between the time and 
category information, which is formally equivalent to comparing the information preserved by (criterium 
I) and lost due to ignoring (criterium II) each of the two aspects. As a result, we gain insight on the 
relevance of each aspect as well as how the aspects interact to transmit information (see lsection 4.41) . These 
procedures can be extended to encompass any two different aspects of the neural response (see lsection 4.51) . 



Notice th at the role of correlations, both in time and/ o r across neuron s , has been evaluated using 



criterium II (Brenner et 



Schneidman et al. 



2003 



al., 



200' 



0; 



Davan and Abbotti 



Montemurro et al. . 



2001 



Nirenberg et al. . 



2001 



Petersen et all 12002c 



20071) . However, these authors did not build two comple- 



mentary representations of the neural response ignoring and preserving the correlations, as proposed here. 
Instead, they ignored correlations by constructing artificial neural responses (or artificial response probabil- 
ities) where different neurons were independent or conditionally independent. Thus, their analysis involves 
a comparison between the real and the artificial neural code. Our analysis, instead, is completely based on 
complementary reductions of the real neural response. Moreover, in previous studies, the artificial neural 
responses are not a transformed version of the real response in a well defined time window. Thus, in some 
cases, the difference between the information with and without preserving co rrelations is not guaranteed to 
be non-negative by the data processing inequality (|Cover and ThomasLll991r) . 



4.3. Representations of the neural response and the data processing inequality 



In some previous studies, the information encoded by different response aspects was assessed, as here, by 
trans forming each neural response window (R T ) of size r through functions, into the pattern representation 



(B.) (IFurukawa and MiddlebrooksL 12002c Petersen et all 12002c iNelken et al. 



2005 



Gollisch and Meisterl . 



2008|) . Examples of those response aspects are the first-spike latencies, spike counts, spike-timing vari- 



abilities and first (second, third, etc.) spikes in a pattern. As a result, the information carried by the 
individual response aspects cannot be greater than that provided by the neural response in the same window 
(I(S; B T ) < I(S; R T )), irrespective of the length r (see Isection 221) . In other studies, however, the pattern 
representation & was obtained by transforming the spike representation inside a sliding window of variable 
length: the length of the window depended on the category of the actual pattern. Then, S was read out with 
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time windows of size r. That is the case, for example, when addressi ng the information conveyed by inter 



spike intervals of length > 38 ms using words of length r = 14.8 ms (|Reich et all |2000|) and by patterns of 



length > 104 ms, > 10 ms and > 56 ms, using time windows up to 64ms, 3.2 ms and \6ms, respectively 



(jReinagel 



and ReidL 12000 : 



Evherabide et al 



2008 



Gaudry and Reinagelil2008|) . Unlike the first approach, 



in this case the data processing inequality does not apply, since S r is not a function of R T . Therefore, 
7(S;B r ) can be larger or smaller than 7(S;R r ). However, when r — » oo, <B T = B T , so asymptotically, both 
approaches coincide. 



4.4. The role of synergy and redundancy in the search for relevant response aspects 



One of the main goals of the analysis of the neural code is to identify the response aspects that are relevant 
to information transmission. In this context, two important questions arise: how relevant the chosen 
aspects are, and how autonomously they stand. Their relevance to information transmission is assessed 
with information-theoretical measures, as exemplified here with the category and time information (see 
Isection 4T2T) . Their autonomy refers to whether each aspect transmits information by itself or not, and 
whether the transmitted information is shared by other aspects or not. The degree of autonomy is assessed 
by quantifying the synergy/redundancy term (A SR ) between the different aspects. 



The concept of synergy/redundancy entails the comparison between the effect of the whole and the 
sum of the individual effects of the constituent parts. The concept requires the constituent parts to be 
univocally determined by the whole, as well as the whole to be completely determined given its constituent 
parts. In other words, the whole and the constituent parts must be related through a bijective function. In 
neuroscience, the synergy/redundancy between groups of neurons has been addressed by comparing the 
information carried by the group of neurons (t he whole) and the sum of the information of e ach and every 



Sc 



ineidman et al. 



neuron from the group (the constituent parts) (IBrenner et all 12000c 
A SR can be interpreted as a trade-off between synergy and redundancy (ISchneidman 



2003h. A s a result, 
etalll2003L 



Intuitively, the presence of synergy (A SR > 0) between two aspects indicates that, for many responses, 
the aspects must be read out simultaneously in order to obtain information about the stimulus. For some 
specific responses, however, one of the aspects may be enough to identify the stimulus. But on average, 
aspects cooperate. On the other hand, the presence of redundancy (A SR < 0) indicates that, for many 
responses, the information conveyed by both aspects overlaps. Therefore, some of the information that can 
be extracted from one aspect taken alone can also be extracted from the other aspect taken alone. There 
might still be a few individual responses for which it is necessary to read both aspects simultaneously to 
obtain information about the stimulus. But on average, messages tend to be replicated in the different 
aspects. 



In the absence of synergy/redundancy (A S R = 0), the aspects might or might not be independent and 



conditionally independent given the stimulus (|Nirenberg and Lathaml 12003 



Schneidman et al. 



2003 ). If 



they are, then both aspects are fully autonomous. However, if they are not, then synergy and redundancy 
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coexist. Some responses might require the simultaneous read out of both aspects. However, for other 
responses, at least one of the individual aspects might be enough to obtain information about the stimulus. 
In this case, by considering both aspects separately, one cannot recover the entire encoded information. 



4.5. Applications 

The main ideas in this paper can also be extended to encompass any neuron response aspects, different 
from pattern timing and pattern category. In particular, they allow us to analyse the information con- 
veyed by different types of pattern s and the synergy/redundancy between them, extending the formalism 
derived in Evherabide et al.1 (120081) . In addition, aspects may also be defined in continuous time since 



Mackay and McCullod. 119521) . Even more, any neural population response can be re presented as a 



sequence of coloured spikes, each colour indicating the neuron that fired the spike (IBrown et al 



sing 



2004) 



Therefore, single neuron codes and population codes can be analysed under the same formalism. 



For example, during the last decades, many studies h ave focused on asse s sing whether different neurons 



transmit informa t ion ab out different stimulus aspects (|Gawne et all 1 19961 : Denning and Reinagell . 



2005 



Evherabide et all 120081) . To that ai m, different neurons (and different neural response aspects) have been 



interpreted as information channels (|Dan et al 



1998 



Montemurro et all 



2008 



Krieghoff etalll2009h . of- 



ten addressing whether they constitute independent channels of information (see Isection 3.71) . However, 
these studies have focused on whether the two aspects (or neurons) are independent and conditionally 



1993 



Schneidman et al. 



2003). Indeed, in this 



independent given the stimulus (IGawne and Richmond 
case, the response aspects constitute independent channels of information. However, these conditions do 
not identify which stimulus aspects are encoded by different neurons, nor they guarantee that they are 
independent. 



To gain insight on the relation between stimulus and response aspects, we determine whether the neuron 
constitutes a canonical feature extractor and/or a canonical feature interpreter. For independent channels 
of information, the response aspects are canonical feature extractors, canonical feature interpreters, and 
also independent and conditionally independent given the stimulus. However, none of these conditions can 
be derived from the other. In effect, a canonical feature extractor or a canonical feature interpreter may 
or may not exhibit synergy or redundancy between the time and category information (see Isection 3/71 and 



Appendix H ). Moreover, even if T and C are independent and conditionally independent given the stimulus, 
T (C) may still convey information about Sc (St) once the information about S T (Sc) has been read out. 
Formally, each of the equalities defining a canonical feature extractor or a canonical feature interpreter 
constitutes a relation between one aspect of the stimulus and one aspect of the response. Such relations 
cannot be derived from the independence or conditional independence between two aspects of the response. 
For the same reason, the what and the when are not guaranteed to be independent aspects. 



Finally, the analysis performed in this work relies on the mutual information between the stimulus and 
different aspects of neural response, and thus it is related to both the encoding operation and the decoding 
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operation ([Shannon, 



1948 



is symmetric by definition (ICover and Thomas . 



Brown et al 



2004; 



Selk en and Chechia . |2007|) . Indeed, the mutual information 



199 lb . Therefore, one can interpret the information between 



the stimulus and a particular aspect of the neural response f rom both points of view, characterizing both 



how the stimulus is encoded into a specific response aspect (IReich et al 



Nelken et al. 



2005 



Evherabide et al. 



2008 



Gollisch and Meiste 



200< 



0; 



Reinagel and Reidl . 



200 



0: 



2008) and what can be inferred about 



the stimulus from a decoder that only decodes that specific aspect. To this aim, an explicit representation of 
each response aspect is needed, for example, as defined in lsection 2.11 Such representations are not always 
available, as for example, in studies assessing the role of correlations (see lsection 4.21) . 
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Appendices 

A. Categorical noise 

The categorical noise is characterised by the probability P b (b\s) that a stimulus feature s elicits a pattern 
response of category b. This probability is related to P r (r|^), the probability of inducing the response r due 
to the feature s, according to 



P b (b\s)= J] Pr ^' (A " 1} 

r 

b=h R ^ B (r) 

where the sum runs through all patterns of spikes r which category is b. 



B. Event counts transmit information at a vanishing rate 



Previous studies have shown that the i nformation per unit time carried by the spike co unt decreases with 
the size of the response time window (|Petersen et all 120021 : iMontemurro et all l2007|) . In this appendix, 
we formally prove this result and also that the information per unit time vanishes in the limit of long 
windows. We extend its validity not only for spikes, but for any response patterns, as defined in lsection 2.11 
irrespective of the number of pattern categories. To that aim, consider a representation rj that only preserves 
the number of patterns in each response segment R T of length t (Figure [3K)- In this representation, two 
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responses stretches R| and are different if and only if they contain a different number of patterns (t/(R*) 
?7(R^)), otherwise they are equal. 



In a real experiment, patterns (and spikes) are not instant (IMackay and McCullodll952|) . Thus, without 
loss of generality, consider the time divided into time bins of size At shorter than the shortest pattern. The 
number of events present in any response stretch R V1 , of length w bins is bounded by < tj w < w, and 
therefore H(t] w ) < log (w + 1). Hence, the entropy rate H{rj) becomes zero, since 



H(rj) = lim < lim = . 



w 



w 



(B-l) 



As a result, the information rate carried by r\ about any other random variable vanishes. In particular, 
/(S; r\) < H{rf) = 0. The result is valid for response patterns of any nature (see lsection 2. II for the definition 
and examples of patterns). 

C. The response set of event categories transmits information at a vanishing rate 



In this appendix, we prove that the information per unit time transmitted by the response set of pattern 
categories decreases with the length of the response time window, and it vanishes in the limit of long time 
windows. To this aim, we consider a representation in which two response segments are indistinguishable 
if and only if they have the same pattern categories, irrespective of their temporal ordering (see Figure |5j3). 
Hence, two neural responses can be different in the category representation and equal in the representa- 



tion. Analogously to Appendix B we only consider that the response events are not instant. 



We first prove the result for the case where the number |S b | of possible different pattern categories is 
finite; a neural response B may be composed of several response patterns. This is indeed the most frequent 
situation in the real neural system (IMackay and McCullod.1 19521) . valid for all the examples of pattern-based 
codes mentioned in Isection 2TI and throughout this paper. Consider that the neural response is read with 
words of length w bins, smaller than the shortest pattern. The number of patterns is bounded by < rj w < w 



(see Appendix B ). In addition, each response pattern may belong to one out of |S b | pattern categories. Thus, 
the number of possible diff erent responses ® w in the representation is upper-bounded by |0J < (w + l) pbl 



(|Cover and ThomasL 
rate is 



199 lh . As a result, its entropy is upper-bounded by H(0 H> ) < log |0J, and its entropy 



//(©) = lim < lim |£ | = 0. 



w— »oo 



VI' 



(C-l) 



Therefore, there is no mutual information rate between the response set of pattern categories and any other 
random variable. Particularly, /(S; 0) = 0. 
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We now generalise the result for infinite codes, under the only condition that patterns belonging to 
different categories have different durations. These codes can be regarded as academic examples since, 
in any real condition, they would be impractical due to the long time periods required to read out the 
codewords. Examples of such infinite codes are bursts codes with no restriction in their duration, inter- 
spike intervals or latencies divided into an infinite number of finite ranges and the number of spikes in 
arbitrarily long time response windows. In a neural response of size w bins, only patterns up to a length 
w can be read (see lsection 4.31 for examples). In addition, a neural response may contain several patterns. 
Thus, the sum of the length of the patterns cannot be greater than the length of the response containing 
them. Under this conditions, the number of response sets of pattern categories |0J is upper-bounded by 



10*1 



(C-2) 



k=Q 



where p(k) represents the number of partitions of th e integer number k. By using the Hardy -Ramanuj an- 



il spensky asymptotic approximation (lApostol , 



199 



0, 



^ e A Vl 



10,1 < !>(*)* c + £ — * 



k=0 



k=ko 



(C-3) 



where A and B are positive constants, k represents an integer for which the approximation is valid, C = 
Z^=o o i P(k) an d me right-most inequality is valid for long enough words. Therefore, the entropy rate H(@) 
results 



//(©) = l im igggggg ( C -4a) 
. Ita « - lim (C .4b) 
= . (C-4c) 



Thus, the entropy rate H(Q) tends to zero, and consequently the mutual information rate that the response 
set of pattern categories can carry about any other random variable vanishes. 
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D. Decomposition of the pattern information 



As mentioned previously, the pattern sequence B and the pair (T, C) carry the same information about the 
stimulus, since they are related through a bijective transformation. Therefore 



7(S;B) = /(S;T,C) 

= Z(S; T) + Z(S; C|T) + Z(S; C) - Z(S; C) 
= Z(S; T) + Z(S; C) - (Z(S; C) - Z(S; C|T)) 
= Z(S;T) + Z(S;C)-Z(S;T;C) 
= Z(S;T) + Z(S;C) + A 5S ; 



(D-la) 
(D-lb) 
(D-lc) 
(D-ld) 
(D-le) 



where Eq. ITJTel is obtained from Eq. ITJTdl bv re placing A SR = -Z(S ; T; C) Here, I(X\ Y\Z) = I(X; Y) 
I(X; Y\Z) represents the triple mutual information (ICover and Thomasl 1 199 ll : iTsujishital 1 1995|) . 



E. Relation between the upper- and lower-bounds of A 



SR 



The synergy/redundancy term (A SR ), defined in Eq. [9} can be written as 



A 5S = Z(S;T|C)-7(S;T) 
= Z(S;C|T)-/(S;C) 
= 7(T;C|S)-7(T;C). 



(E-2a) 
(E-2b) 
(E-2c) 



Hence, the upper- and lower-bounds of A s R are 



'Z(T;C) 
min \ /(S; T) 
Z(S;C)J 



7(T;C|S)) 
< A 5S <min^/(S;T|C) 
,Z(S;C|T) 



In addition, these upper- and lower-bounds are related through Eqs. IE-21 in such a way that 



(E-3) 



7(T;C)] fZ(T;C|S)' 
min I Z(S; T) \ = I(X, Y) » min I Z(S; T|C) 
,Z(S;C)J U(S;C|T), 



I(X, Y\Z) ■ 



(E-4) 



where X, Y and Z represent the variables S, T and C in any order. This proves the upper- and lower-bounds 
for the synergy/redundancy term A SR of Eq. \Wi 
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F. Information decomposition in representations of the neural response 

The information that a representation X of the neural response conveys about the stimulus can be decom- 
posed as 

/(S; X) = Z(S T ; X) + 7(S C ; X) + A X SR ; (F-l) 

where 7(S T ;X) is the information conveyed by X about the when in the stimulus, and /(Sc;X) is the in- 
formation conveyed by X the what. Here, A* R represents the synergy/redundancy between the information 
conveyed about the when and the what, and it is given by 

Aj s = -/(S T ;S C ;X); (F-2) 
which is lower-bounded by the redundancy in the stimulus 

A^>-/(S T ;S C ). (F-3) 

Tighter upper and lower bounds for A* R can be derived analogously to the ones derived for A SR (see Eq. 
[TOl) . as well as analogous conditions for the absence of either synergy or redundancy between /(St; X) and 

/(S C ;X). 

Notice that when A* R > 0, the information provided about the stimulus is greater than the sum of the 
information about when and what stimulus features happen, i.e. 

/(S;X)>/(S T ;X) + /(S C ;X). (F-4) 

This may occur, for example, if the latency in the response depends on the feature category. In this case, the 
information that the time representation T carries about the time positions of stimulus features St might be 
increased due to the knowledge of the feature categories So In conclusion, there would be an information 
component that is not uniquely related to either when or what: It refers to both. 

In the case that /(Sc; X|Sx) = 0, the synergy/redundancy A* R becomes 

Af* = -/(S C ;X). (F-5) 
Thus, 7(S C ; X) is completely redundant with and lower than 7(S T ; X). That is, 
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/(S;X) = /(S T ;X)+/(Sc;X) + Af J 
7(S C ;X) + /(S T ;X|S C ) = /(S T ;X) 
/(S C ;X)</(S T ;X). 



(F-6a) 
(F-6b) 
(F-6c) 



Notice that this is the case of the time representation in a canonical feature extractor. Indeed, the 
definition of a canonical feature extractor states that 7(Sc; T|St) = 0, and consequently 

A^ = -/(S C ;T). (F-7) 

The implications of condition I17al mentioned in Isection 3771 follow directly from this equation. By inter- 
changing S T and S c in Eqs. lF-5l and lF-61 analogous conclusions can be derived for the category representa- 
tion. 

G. Redundancy bounds for the canonical feature extractor 

To prove Eq. [201 we expand 



7(T,S T ;C,S C ) = 

= /(St; S c ) + /(T;S C |S T ) + /(S T ;C|S C ) +/(T; C|S T , S c ) (G-la) 

=0 (£q |17al =0 (£q |17bl 

= 7(T; C) + /(S T ; C|T) + 7(T; S C |C) + Z(S T ; S C |T, C) . (G-lb) 

>0 >0 >0 

Applying both conditions 1 1 7al and 1 1 7bl the second and third term of Eq. IG-lal vanish respectively, and 
the synergy/redundancy between the time and category information (A SR ) is lower-bounded by 

-/(S T ;S C )<A S «. (G-2) 

This is the bound that we wanted to prove. 
H. The canonical feature interpreter 

We define a canonical feature interpreter as a neuron model in which 
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7(C;S T |T) = (H-3a) 
7(T;S C |C) = 0. (H-3b) 

Under each of these conditions, the information conveyed by the neural response B about the what (7(B; Sc)) 
and about the when (7(B; S T )) becomes 



7(B; S T ) = 7(T; S T ) < 77(S T ) (H-4a) 
7(B; S c ) = 7(C; S c ) < 77(S C ) . (H-4b) 

Consequently, the what (the when) in the stimulus is completely represented in the category (time) rep- 
resentation. In other words, 7(S C ;T) (7(S T ;C)) is completely redundant with 7(S C ;C) (7(S T ;T)), and 
7(S C ; T) < 7(S C ; C) (7(S T ; C) < 7(S T ; T)). The canonical feature interpreter is analogous to the canonical 
feature extractor. In fact, it can be obtained by interchanging the role of the stimulus and the response in 
Isection 3771 
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